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Abstract. The convergence rate is analyzed for the SpaSRA algorithm (Sparse Reconstruction 
by Separable Approximation) for minimizing a sum /(x) + i/>(x) where / is smooth and ip is convex, 
but possibly nonsmooth. It is shown that if / is convex, then the error in the objective function at 
iteration k, for k sufficiently large, is bounded by a/(b + k) for suitable choices of a and b. Moreover, 
if the objective function is strongly convex, then the convergence is fj-linear. An improved version 
of the algorithm based on a cycle version of the BB iteration and an adaptive line search is given. 
The performance of the algorithm is investigated using applications in the areas of signal processing 
£Nj ■ and image reconstruction. 
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1. Introduction. In this paper we consider the following optimization problem 

O : 

.. min 0(x):=/(x)+^x, 1.1) 

~ . x£t" 
-I— > . 

where / : R™ — > R is a smooth function, and ip : R™ — > R is convex. The function ip, 
usually called the regularizer or regularization function, is finite for all x £ R™, but 
possibly nonsmooth. An important application of (|1.1|) , found in the signal processing 
I \ literature, is the well-known £2 — (-i problem (called basis pursuit denoising in [7]) 

>' 1 
O: min -||Ax-b||2 + r||x|| 1; (1.2) 

no- 
where A £ R kxn (usually k<n),h£ R fc , r £ R, r > 0, and || • ||i is the 1-norm. 

Recently, Wright, Nowak, and Figueiredo [53] introduced the Sparse Reconstruc- 
tion by Separable Approximation algorithm (SpaRSA) for solving (|1.1|) . The algo- 

Q\ I rithm has been shown to work well in practice. In [24] the authors establish global 

convergence of SpaRSA. In this paper, we prove an estimate of the form a/(b + k) 
for the error in the objective function when / is convex. If the objective function 

• *h . is strongly convex, then the convergence of the objective function and the iterates is 

at least R-linear. A strategy is presented for improving the performance of SpaRSA 
based on a cyclic Barzilai-Borwein step [8] [9] [T3J [19] and an adaptive choice [15] for 
the reference function value in the line search. The paper concludes with a series of 
numerical experiments in the areas of signal processing and image reconstruction. 

Throughout the paper V/(x) denotes the gradient of /, a row vector. The gra- 
dient of /(x), arranged as a column vector, is g(x). The subscript k often represents 
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the iteration number in an algorithm, and g^ stands for g(xj.). || • || denotes || • H2, the 
Euclidean norm. dip(y) is the subdifferential at y, a set of row vectors. If p G dtp(y), 
then 

-0W > ip(y) +p(x-y) 

for all x £ R". 

2. The SpaRSA algorithm. The SpaRSA algorithm, as presented in [24], is 
as follows: 

Sparse Reconstruction by Separable Approximation (SpaRSA) 
Given r\ > 1 , a G (0,1), [a m i n ,a max ] C (0, 00) , and starting guess xi . 
Set k = 1 . 
Step 1. Choose a G [a min , a max ] 

Step 2. Set a = r) J ao where j >0 is the smallest integer such that 

(^(xfc+i) <4>k~ cra||x fc+ i - x fc || 2 where 

x fc+ i =argmin{V/(x fc )z + a||z-x fc || 2 +V(z) :zeR"}. 

Step 3. If Xfc + i = Xfe , terminate. 

Step 4. Set fc = fc + l and go to step 1. 



The parameter ao in [24] was taken to be the BB parameter [T] with safeguards: 

= af B = min {||as fe - y fe || : a min < a < a max } (2.1) 

where = Xfc — x^-i and y^ = gfc — gfe-i. Also, in [24], the reference value <j>^ is 
the GLL [14] reference value </>™ ax defined by 

0^ ax = max{^(x fe _ i ) : < j < min(fc, M)}. (2.2) 

In other words, at iteration k, 0™ ax is the maximum of the M most recent values for 
the objective function. Note that if x^+i = X&, then 

e V/(x fe ) + 5V>(x fc+1 ) = V/(x fc+1 ) + ^(xfc+i). 

Hence, x^+i = Xfc is a stationary point. 

The overall structure of the SpaRSA algorithm is closely related to that of the 
Iterative Shrinkage Thresholding Algorithm (ISTA) \M [HI [TBI US] . ISTA, however, 
employs a fixed choice for a related to the Lipschitz constant for /, while SpaRSA 
employs a nonmonotone line search. A sublinear convergence result for a monotone 
line search version of ISTA is given by Beck and Teboulle [2 j and by Nesterov [18] . In 
Section [3] we give a sublinear convergence result for the nonmonotone SpaRSA, while 
Section [4] gives a linear convergence result when the objective function is strongly 
convex. 

In [24] it is shown that the line search in Step 2 terminates for a finite j when / is 
Lipschitz continuously differentiable. Here we weaken this condition by only requiring 
Lipschitz continuity over a bounded set. 

Proposition 2.1. Let C be the level set defined by 

£ = {xe R" : <P(k) < (2.3) 
We make the following assumptions: 
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(Al) The level set C is contained in the interior of a compact, convex set K., and 

f is Lipschitz continuously differentiable on /C. 
(A2) ip is convex and ?/>(x) is finite for all x £ R™. 
If 4>(x.k) < 4>h ^ <M x i); then there exists a with the property that 

0(x fe+ i) <<t>k~ ^allxfc+i - x fc || 2 

whenever a > a where Xfc + i is obtained as in Step 2 of SpaRSA. 
Proof. Let $fe be defined by 

$ fe (z) = /(x fc ) + V/(x fc )(z - x fe ) + a\\z - x fc || 2 + V(z), 

where a > 0. Since is a strongly convex quadratic, its level sets are compact, and 
the minimizer Xfe + i in Step 2 exists. Since Xfe + i is the minimizer of we have 

$fe(x/c+i) = /(xfe) + V/(xfe)(x fe+ i - Xfe) + a||xfc+i - Xfe || 2 + -0(x fe+ i) 
< $ fe (x A; ) = /(x fe ) + V^Xfc). 

This is rearranged to obtain 

a||x fe+ i - Xfe 1 1 2 < V/(xfe)(xfc - Xfc+i) + ^(x/c) - ^(xfc+i) 

< V/(x fe )(x fe - X fe+ l) + Pfe(x fe - X fe+ l), 

where pk £ 9"0( x fe)- Taking norms yields 

||x fe+x -x fc || < (llgfell + ||p fc ||)/a. (2.4) 

By Theorem 23.4 and Corollary 24.5.1 in [5D] and by the compactness of C, there 
exists a constant c, independent of Xfe £ C, such that ||gfe|| + ||pfe|| < c. Consequently, 
we have 

||x fc+ i - Xfe || < c/a. 

Since JC is compact and C lies in the interior of JC, the distance S from C to the 
boundary of JC is positive. Choose (3 £ (0,oo) so that c/(3 < 5. Hence, when a > j3, 
Xfe + i £ JC since Xfe £ C. 

Let A denote the Lipschitz constant for / on JC and suppose that a > f3. Since 
Xfe £ C C JC and ||xfe +1 — Xfe|| < 5, we have Xfe +1 £ JC. Moreover, due to the convexity 
of JC, the line segment connecting Xfe and Xfe +1 lies in JC. Proceeding as in [24] . a 
Taylor expansion around Xfe yields 

/(xfe+i) < /(xfe) + V/(xfe)(x fc+1 - Xfe) + .5A||xfe +1 - Xfef. 

Adding ?/;(xfc+i) to both sides, we have 

<Kx fe+ i) < *fe(x fc+ i) + (.5A - a)||xfe+i - Xfc|| 2 (2.5) 

< $fe(x fe ) + (.5A - a) ||x fe+1 - Xfe|| 2 
= 0(xfe) + (.5A - a) ||x fc+ i - Xfe || 2 

< 0f + (.5A - a) ||xfe +1 - Xfe || 2 since 0(x fc ) < 

< - cra||xfc + i - Xfe || 2 if .5A - a < -act. 
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Hence, the proposition holds with 

5 = max |^_A_|. 

□ 

Remark 1. Suppose <f>^ < </>(xi). In Step 2 of SpaRSA, x^ + i is chosen so that 
0(xfc+i) < tfik'. Hence, there exists </>j? +1 such that <fi(x.k+i) < 4>k+i ^ </>( x i)- ^ n other 
words, if the hypothesis "(f>('X-k) < 4>k' — <?K x i)" of Provosition [2~1\ is satisfied at step 
k, then a choice for exists which satisfies this hypothesis at step k + 1. 

Remark 2. We now show that the GLL reference value </>™ ax satisfies the condi- 
tion 0(xfe) < (f>^ < </>(xi) of Provosition [2~J\ for each k. The condition (j>™ ax > </>(xfc) 
is a trivial consequence of the definition of </>™ ax . Also, by the definition, we have 
0f ax = 0(xi). For k > 1, 0(x fe+ i) < 0' fc nax according to Step 2 of SpaRSA. Hence, 
^max j s a d ecreas ing function of k. In particular, </>™ ax < ~ <X x i)- 

3. Convergence estimate for convex functions. In this section we give a 
sublinear convergence estimate for the error in the objective function value 4>(x.k) 
assuming / is convex and the assumptions of Proposition 12 . 1 1 hold . 

By (Al) and (A2), has a solution x* £ £ and an associated objective 

function value <f>* :— </>(x*). The convergence of the objective function values to 
<j>* is a consequence of the analysis in [53] : 

Lemma 3.1. If (Al) and (A2) hold and <f>^ = 0" lax /or every k, then 

lim 0(x fc ) = (f>*. 
k — >oc 



Proof. By [M] Lemma 4], the objective function values 4>( K k) approach a limit 
denoted <j>. By [2H Theorem 1], all accumulation points of the iterates x& are sta- 
tionary points. An accumulation point exists since /C is compact and the iterates are 
all contained in C C /C, as shown in Remark [2j Since / and ip are both convex, a 
stationary point is a global minimizer of <fi. Hence, <f> = <f>* . □ 

Our sublinear convergence result is the following: 

Theorem 3.2. If (Al) and (A2) hold, f is convex, and cj>£ = 0™ ax for all k, 
then there exist constants a and b such that 

for k sufficiently large. 

Proof. By (|2.5|) with k + 1 replaced by A:, we have 

0(x fe ) < $ fe _i(x fe ) + 6 ||s fe || 2 , 6 = .5A, (3.1) 
where Sk = Xfe — x,t_i. Since x/j minimizes and / is convex, it follows that 

$fc_i(x fe ) = min{/(x fe _i) + V/(x fc _i)(z - x fc _i) + a k -i\\z - Xk-i\\ 2 + V>( z )} 
zeR" 

< min{/(z) + V(z) + a fc _x ||z - x fc _! || 2 : z e K™} 

= min{0(z) + a k -i ||z - x fc _x || 2 : z £ M™}, (3.2) 
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where a^-i is the terminating value of a at step k — 1. Combining (|3.1|) and (|3.2D 

gives 

0(x fc ) < min{0(z) + /3||z - x^f : z e M"} + 6 ||s fe || 2 , (3.3) 

where j3 — rja is an upper bound for the a k implied by Proposition 12.11 By the 
convexity of cj> and with z = (1 — A)xfc_i + Ax* for any A G [0, 1], we have 



* 1 1 2 
2 



min 0(z) + (3\\z - x fe _i || 2 < 0((1 - A)x fe _i + Ax*) + /3A 2 ||x fc _i - x 

< (1 - A)^(xfc_i) + A0* + /3A 2 ||x fc _! - x 
= (l-A)0(x fc _ 1 ) + A0*+fe fe A 2 , 

where = /3||xfe_i — x*|| 2 . Combining this with (|3.3p yields 

#x fc ) < (1 - A)0(x fc _i) + A0* + 6 fc A 2 + 6 ||s fe || 2 

< (1 - \)4>%_ x + W + b k \ 2 + 6 ||s fe || 2 (3.4) 

for any A £ [0, 1]. Define 

fa = max{^(x t ) : (* - l)M <k< iM} = (j>f M , (3.5) 

and let ki denote the index k where the maximum is attained. Since 4>{x.k+i) < <j>k 
in Step 2 of SpaRSA, it follows that <f>^ = (/>™ ax is a nonincreasing function of k. By 
()3.4p with k = ki and by the monotonicity of (f)^, we have 

& < (l-A)^_ 1 +A0*+& fe ,A 2 + 6ol!s fcl || 2 (3.6) 

for any A 6 [0, 1]. Since both Xfc_i and x* lie in C, it follows that 

b k = 0\\x k -i - x* || 2 < /3(diameter of C) 2 := b 2 < oo. (3.7) 

Step 2 of SpaRSA implies that 

H\\ 2 <(4$-i-<Kxk))/bi 

where b\ = aa mm . We take k = fcj and again exploit the monotonicity of </>j? to obtain 

||s fc J 2 < -&)/&i. (3.8) 

Combining (|3.6p - (|3.8[) gives 

4k < (1 - A)&-i + A0* + 6 2 A 2 + b 3 (0i-i - 6a = 60/61, (3.9) 
for every A G [0, 1], The minimum on the right side is attained with the choice 

t-H 1 -^}- (3 ' l0) 

As a consequence of Lemma T3.1[ 0j_i converges to 0*. Hence, the minimizing A also 
approaches as i tends to 00. Choose k large enough that the minimizing A is less 
than 1. It follows from (|3.9p that for this minimizing choice of A, we have 

<t>i < <t>i-i - ^~\7 ^ + 63(^-1 - (3.11) 
409 
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Define = 4n — <\>* . Subtracting (p* from each side of p. lip gives 

&i < ei-i - ef_ 1 /(46 2 ) + 63(64-1 - e,-) 
= (1 + 63)6^1 - e?_ x /(46 2 ) - b 3ei . 

We arrange this to obtain 

e» < e,_i - 6 4 ef_ 1 where 64 = — ■ , . ■ (3-12) 

462(1 + 63) 

By p,12[) e, < e»_i, which implies that 



e, < e 4 _i - 6 4 ej_iej or < 



1 + &4e l -i 

We form the reciprocal of this last inequality to obtain 

1 1 

- > + 6 4 - 

Applying this inequality recursively gives 
1 1 

— > h (i - JJ64 or ej < 



1 + (i - i)64ej 



where j is chosen large enough to ensure that the minimizing A in (|3.10j) is less than 
1 for all i > j. 

Suppose that k € ((i — 1)M, zM] with i > j. Since i > fc/M, we have 
0(x*) - 0* < ej < - , , ej .,, < 



1 + (i — fyb^ej 1 — jb&ej + kb^ej/M 

The proof is completed by taking a = M/64 and 6 = M / (b^ej) — Mj. □ 

4. Convergence estimate for strongly convex functions. In this section we 
prove that SpaRSA converges R-linearly when / is a convex function and <f> satisfies 

<Xy) >0(x*)+/i||y-x*|| 2 (4.1) 

for all y 6 M. n , where /i > 0. Hence, x* is a unique minimizer of <j>. For example, if / 
is a strongly convex function, then (14. ip holds. 



Theorem 4.1. If (Al) and (A2) hold, f is convex, cj> satisfies \4-l\ , and <j>£ = 
0max j Qr ever y ^ then there exist constants 6 £ (0, 1) and c such that 

0(x fc ) - 0* < c0 fc (0( Xl ) - 0*) (4.2) 

for every k. 

Proof. Let 4>i be defined as in (|3 . 5[) . We will show that there exist 7 £ (0, 1) such 

that 

<k -4>*< 7O&-1 -<£*)• (4-3) 
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0< Cl <min<j -L-iLl. (4.4) 



Let C\ be chosen to satisfy the inequality 

We consider 2 cases. 

Case 1. ||s fe J 2 > ci0 4 _i - </>*). 
By (|3.8[) . we have 

ci(^i-i - 0*) < - <pi)/h. 
This can be rearranged to obtain 

<k -<!>*< (l-&ici)(</>i-i -</>*), 

which yields (|4.3[) . 

Case 2. ||s fc J 2 < ci(&_x - cj>*). 
We utilize the inequality (|3 . 6[) but with different bounds for the and terms. 
For fc G ((i - 1)M, iM), we have 

b k := - x*|| 2 < ^(0(xfc_!) - 0*) < - 0*) 

< -(^-r)M - 0*) = H4>i-i - 0*), = -. 

The first inequality is due to (|4.1|) and the last inequality is since 4>k 1S monotone 
decreasing. By the definition of ki below (|3.5[) . it follows that ki e ((i — 1)M, iM] and 

6fc* < - 0*). (4-5) 

Inserting in (|3 . 6[) the bound (|4.5[) and the Case 2 requirement ||sfe i || 2 < ci(</>i_i — 4>*) 
yields 

& < (1 - A)0i_i + A0* + 65(^-1 - 0*)A 2 + W^-i - 0*) 
for all A G [0, 1]. Subtract <j>* from each side to obtain 

e l <[l + b c l -\ + b 5 \ 2 ]e t - l (4.6) 

for all A G [0,1]. 

The A G [0, 1] which minimizes the coefficient of &i-\ in (|4.6|) is 

A = min { 1 'i. 

If the minimizing A is 1, then 65 < 1/2 and the minimizing coefficient in (14. 6|) is 

7 = 6 C! + 65 < 6 ci + 1/2 < 1 

since ci < 1/(26q) by (]4.4[) . On the other hand, if the minimizing A is less than 1, 
then 65 > 1/2 and the minimizing coefficient is 

7 = 1 + 6 ci - — < 1 
4t> 5 
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since l/(46 5 ) = m/(4/3) > b c 1 by This completes the proof of l[3~3"|) . 

For fc e ((i - 1)M, iM], we have 

0(x fc ) - ^* < ei < 1 l ~ 1 e l < - ^ M ) k (0( Xl ) - 0*). 

Hence, (j4.2[) holds with c = 1/7 and = ^ X I M . This completes the proof. □ 

Remark 3. The condition when combined with j^.ffp shows that the iterates 
Xfe converge R-linearly to x*. 

5. More general reference function values. The GLL reference function 
value 0™ ax , defined in (|2.2|) , often leads to greater efficiency when M > 1, when 
compared to the monotone choice M = 1. In practice, it is found that even more 
flexibility in the reference function value can further accelerate convergence. In |15) 
we prove convergence of the nonmonotone gradient projection method whenever the 
reference function </>^ satisfies the following conditions: 

(Rl) 4>f = 0(xi). 

(R2) (f(x k ) < (f>% < max{0f_ 1 , <^ nax } for each fc > 1. 
(R3) <j>% < </>™ ax infinitely often. 

In [15] we provide a specific choice for 0f which satisfies (R1)-(R3) and which 
gave more rapid convergence than the choice <j)^ = </)™ ax . To satisfy (R3), we could 
choose an integer L > and simply set ^ = </)™ ax every L iterations. Another 
strategy, closer in spirit to what is used in the numerical experiments, is to choose a 
decrease parameter A > and set (/>£ = (/>™ ax if 0(xfc_i) — (j)(xk) < A. We now give 
convergence results for SpaRSA whenever the reference function value satisfies (Rl)~ 
(R3). In the first convergence result which follows, convexity of / is not required. 

Theorem 5.1. // (Al) and (A2) hold and the reference function value cf>^ satisfies 
(R1)-(R3), then the iterates x^ of SpaRSA have a subsequence converging to a limit 
x satisfying e d<fi(5c.). 

Proof. We first apply Proposition 12.11 to show that Step 2 of SpaRSA is fulfilled 
for some choice of j. This requires that we show <j>^ < 0(xi) for each fc. This holds 
for = 1 by (Rl). Also, for k — 1, we have 0™ ax = </>(xi). Proceeding by induction, 
suppose that <pf- < </>(xi) and </)™ ax < </>(xi) for i = 1, 2, fc. By Proposition 12. 11 
Step 2 of SpaRSA terminates at a finite j and hence, 

<Mx fc+ i) < <t>* < 0(x x ). 

It follows that 4>'k+i < 0( x i) and </>f +1 < max {0^0fe+?} < <H x i)- Tms completes 
the induction step, and hence, by Proposition 12. 1[ it follows that in every iteration, 
Step 2 of SpaRSA is fulfilled for a finite j. 
By Step 2 of SpaRSA, we have 

0(xfc) < <j)k-l - CTamin||Sfe|| 2 , 

where = x& — x^-i. In the third paragraph of the proof of Theorem 2.2 in [15] . 
it is shown that when an inequality of this form is satisfied for a reference function 
value satisfying (R1)-(R3), then 

lim inf ||sfc|| =0. 

k — >oc 
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Let ki denote a strictly increasing sequence with the property that tends to 
and Xfc ; approaches a limit denoted x. That is, 

lim = and lim = x. 

i — >oo i — >oo 

Since Sfc 4 tends to 0, it follows that x^-i also approaches x. By the first-order 
optimality conditions for x/^ , we have 

0eV/(x^ 1 ) + 2a ki (x^ -Xfc,_i)+a^(x fct ), (5-1) 

where denotes the value of a in Step 2 of SpaRSA associated with Xfc 4 . Again, 
by Proposition 12. 11 we have the uniform bound a/ Ci < /3 = rja. Taking the limit as i 
tends to oo, it follows from Corollary 24.5.1 in [20] that 

e V/(x) + 9^(x). 

This completes the proof. □ 

With a small change in (R3), we obtain either sublinear or linear convergence of 
the entire iteration sequence. 

Theorem 5.2. Suppose that (Al) and (A2) hold, f is convex, the reference 
function value satisfies (Rl) and (R2), and there is L > with the property that 
for each k, 

4>f < 0™ ax for some j e[k,k + L). (5.2) 
Then there exist constants a and b such that 

for k sufficiently large. Moreover, if <f> satisfies the strong convexity condition 
then there exists 9 G (0, 1) and c such that 

<Kx fc ) - <P* < c0*(0(xi) - (j>*) 

for every k. 

Proof. Let ki, i — 1,2,..., denote an increasing sequence of integers with the 
property that <f)f < </>™ ax for j — ki and <f)f < <t>f-± when ki < j < k i+ i. Such a 
sequence exists since (j)^ < max{0^_ 1 , 0™ ax } for each k and (|5.2[) holds. Moreover, 
ki-i-i — ki < L. Hence, we have 

4f < 4& < C"> when h<3< ki+i- (5.3) 

Let us define 

^ ax + = maxj^x^, : < i < min(j, M + L)}. 

Given j, choose ki such that j 6 [fcj, fcj+i). Since j — ki < L, the set of function 
values maximized to obtain </>™ ax is contained in the set of function values maximized 
to obtain 0™ ax+ and we have 



10 



W. W. HAGER, D. T. PHAN, H. ZHANG 



Combining (fOj) and (GE3} yields <pf < 0™ ax+ for each j. In Step 2 of SpaRSA, the 
iterates are chosen to satisfy the condition 

</>(x fc+ i) <4>k - Hl x fc+i - XfeU 2 . 

It follows that 

0(xfc +1 )<^ ax +-<7a||x fe+1 -x fc || 2 . 

Hence, the iterates also satisfy the GLL condition, but with memory of length M + L 
instead of M. By Theorem 13.21 the iterates converge at least sublinearly. Moreover, 
if the strong convexity condition (14. ip holds, then the convergence is R-linear by 
Theorem O □ 

6. Computational experiments. In this section, we compare the performance 
of SpaRSA with the GLL reference function value </>™ ax and the BB choice for ao 
in SpaRSA, to that of an adaptive implementation based on the reference function 
value (j>^ given in the appendix of [15] and a cyclic BB choice for ao- We call this 
implementation Adaptive SpaRSA. This adaptive choice for <j)j£ satisfies (R1)-(R3) 
which ensures convergence in accordance with Theorem 15.11 By a cyclic choice for 
the BB parameter (see E] [T3l US]), we mean that ao = a^ B is reused for several 
iterations. More precisely for some integer m > 1 (the cycle length), and for all 
k £ ((i — l)m, im], the value of ao at iteration k is given by 

(a )fc = af^ 1)m+1 . 

The test problems are associated with applications in the areas of signal process- 
ing and image reconstruction. All experiments were carried out on a PC using Matlab 
7.6 with a AMD Athlon 64 X2 dual core 3 Ghz processor and 3GB of memory run- 
ning Windows Vista. Version 2.0 of SpaSRA was obtained from Mario Figueiredo's 
webpage |http://www. rx.it.pt/~mtf/SpaRSA/] ). The code was run with default pa- 
rameters. Adaptive SpaRSA was written in Matlab with the following parameter 
values 

amin = 10" 30 , a max = 10 30 , r) = 5, a = 10" 4 , M = 10. 

The test problems, such as the basis pursuit denoising problem ()1.2p . involve a pa- 
rameter r. The choice of the cycle length was based on the value of r: 

m = 1 if r > 10~ 2 , otherwise m = 3. 

As r approaches zero, the optimization problem becomes more ill conditioned and the 
convergence speed improves when the cycle length is increased. 

The stopping condition for both SpaRSA and Adaptive SpaRSA was 

a fe ||x fc+ i - XfeHoo < e, 

where at denotes the final value for a in Step 2 of SpaRSA, || • ||oo is the max-norm, 
and e is the error tolerance. This termination condition is suggested by Vandenberghe 
in [22]. As pointed out earlier, x^ is a stationary point when Xfc + i = Xfc. For other 
stopping criteria, see [16] or [24]. In the following tables, "Ax" denotes the number 
of times that a vector is multiplied by A or A T , "cpu" is the CPU time in seconds, 
and "Obj" is the objective function value. 



GRADIENT-BASED METHODS FOR SPARSE RECOVERY 



11 



6.1. £2—^1 problems. We compare the performance of Adaptive SpaRSA with 
SpaRSA by solving £ 2 — £\ problems of form (|1.2p using the randomly generated data 
introduced in [TTl |2"4"] . The matrix A is a random k x n matrix, with k — 2 8 and 
n = 2 10 . The elements of A are chosen from a Gaussian distribution with mean zero 
and variance l/(2n). The observed vector is b = Ax true + n, where the noise n is 
sampled from a Gaussian distribution with mean zero and variance 10 -4 . Xt rue is a 
vector with 160 randomly placed ±1 spikes with zeros in the remaining elements. This 
is a typical sparse signal recovery problem which often arises in compressed sensing 
[IT] . We solved the problem (|1.2[) corresponding to the error tolerance 1CP 5 with 
different regularization parameters r between 10 _1 and 10 -5 . Table [6J] reports the 
average cpu times (seconds) and the number of matrix-vector multiplications over 
10 runs for both the original SpaRSA algorithm and an implementation based on a 
continuation method (see [16]). The implementations using the continuation method 
are indicated by "/c" in Table 16.11 These results show that the Adaptive SpaRSA 
is significantly faster than SpaSRA when not using the continuation technique. The 
performance gap decreases when the continuation technique is applied. Nonetheless, 
Adaptive SpaRSA yields better performance. 

Figure 16.11 plots error versus the number of matrix- vector multiplication for r = 
10 -4 and the implementation without continuation. When the error is large, both al- 
gorithm have the same performance. As the error tolerance decreases, the performance 
of the adaptive algorithm is significantly better than the original implementation. 

Table 6.1 

Average over 10 runs for £2 — £1 problems 



T 


le-1 


le-2 


le-3 


le-4 


le-5 




Ax 


cpu 


Ax 


cpu 


Ax 


cpu 


Ax 


cpu 


Ax 


cpu 


SpaRSA 


65.3 


.07 


706.4 


.56 


3467.5 


2.73 


8802.9 


6.86 


5925.5 


4.65 


Adaptive 


65.4 


.07 


582.8 


.44 


1998.8 


1.58 


4394.0 


3.50 


2911.9 


2.36 


SpaRSA/c 


65.3 


.07 


626.7 


.48 


2172.1 


1.67 


684.9 


.52 


474.8 


.36 


Adaptive/c 


65.4 


.07 


569.0 


.44 


1928.3 


1.51 


636.0 


.50 


453.7 


.34 



6.2. Image deblurring problems. In this subsection, we present results for 
two image restoration problems based on images referred to as Resolution and Cam- 
eraman. The images are 256 x 256 gray scale images; that is, n = 256 2 = 65536. 
The images are blurred by convolution with an 8 x 8 blurring mask and normally dis- 
tributed noise with standard deviation 0.0055 is added to the final signal (see problem 
701 in [UJ). The image restoration problem has the form (|1.2p where r = 0.00005 
and A = HW is the composition of the blur matrix and the Haar discrete wavelet 
transform (DWT) operator. For these test problems, the continuation approach is no 
faster, and in some cases significantly slower, than the implementation without contin- 
uation. Therefore, we solved these test problems without the continuation technique. 
The results in Table 16.21 again indicate that the adaptive scheme yields much better 
performance as the error tolerance decreases. 

6.3. Group-separable regularizer. In this subsection, we examine perfor- 
mance using the group separable regularizers [24: for which 

n 

V-(x) = ||x H || 2 , 

i=l 
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Fig. 6.1. Number of matrix-vector multiplications versus error 
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Fig. 6.2. Deblurring the resolution image 
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Fig. 6.3. Deblurring the cameraman image 



where xm , x™ , . . . , xr m i are m disjoint subvectors of x. The smooth part of <f> can be 
expressed as /(x) = |||Ax — b|| 2 , where A S R 1024x4096 W as obtained by orthonor- 
malizing the rows of a matrix constructed in Subsection I6.ll The true vector x true has 
4096 components divided into m = 64 groups of length U — 64. x trtle is generated by 
randomly choosing 8 groups and filling them with numbers chosen from a Gaussian 
distribution with zero mean and unit variance, while all other groups are filled with 
zeros. The target vector is b = Axt rU e + n, where n is Gaussian noise with mean 



Table 6.2 
Deblurring images 



error 


lc-2 


lc-3 


lc-4 


le-5 




Ax 


cpu 


Obj 


Ax 


cpu 


Obj 


Ax 


cpu 


Obj 


Ax 


cpu 


Obj 


Resolution 


SpaRSA 


49 


2.57 


.4843 


88 


4.80 


.3525 


458 


24.74 


.2992 


1679 


88.27 


.2970 


Adaptive 


37 


1.93 


.5619 


73 


4.02 


.3790 


316 


17.28 


.2981 


681 


35.90 


.2970 


Cameraman 


SpaRSA 


34 


1.66 


.3491 


77 


3.99 


.2181 


332 


17.08 


.1880 


1356 


69.45 


.1868 


Adaptive 


35 


1.71 


.3380 


63 


3.31 


.2232 


215 


11.20 


.1880 


599 


31.4 


.1868 
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Fig. 6.4. Group-separable reconstruction 

zero and variance 10~ 4 . The regularization parameter is chosen as suggested in |24j : 
t = 0.3||A T b|| oo . We ran 10 test problems with error tolerance = 10~ 5 and compute 
the average results. Adaptive SpaRSA solved the test problem in 0.8420 seconds with 
67.4 matrix/vector multiplications, while the SpaRSA obtained similar performance: 
0.8783 seconds and 69.1 matrix/vector multiplications. Figure l6~4l shows the result 
obtained by both methods for one sample. 

6.4. Total-variation phantom reconstruction. In this experiment, the im- 
age is the Shepp-Logan phantom of size 256 x 256 (see OE]). The objective function 
was 

0(x) = i||A(x)-b|| 2 + .OlTV(x) 

where A is a 6136 x 256 2 matrix corresponding to 6136 locations in the 2D Fourier 
plane (masked_FFT in Matlab). The total variation (TV) regularization is defined as 
follows 

TV(x) = ]T V / (A?x) 2 + (AVx) 2 

i 

where A' 1 and are linear operators corresponding to horizontal and vertical first 
order differences (see [4]). As seen in Table ROl Adaptive SpaRSA was faster than 
the original SpaRSA when the error tolerance was sufficiently small. 

Table 6.3 
Total-variation phantom reconstruction 



error 


le-2 


le-3 


le-4 




Ax cpu Obj 


Ax cpu Obj 


Ax cpu Obj 


SpaRSA 
Adaptive 


14 2.55 36.7311 
14 2.57 36.7311 


143 30.06 14.7457 
136 27.32 14.6840 


2877 938.25 14.1433 
731 185.62 14.1730 
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Estimate with err = 1e-2 



Estimate with err = 1e-4 





Fig. 6.5. Phantom reconstruction 



7. Conclusions. The convergence properties of the SpaRSA algorithm (Sparse 
Reconstruction by Separable Approximation) of Wright, Nowak, and Figueiredo [Mj 
are analyzed. We establish sublinear convergence when <j> is convex and the GLL 
reference function value [14] is employed. When <f> is strongly convex, the convergence 
is R-linear. For a reference function value which satisfies (R1)-(R3), we prove the 
existence of a convergent subsequence of iterates that approaches a stationary point. 
For a slightly stronger version of (R3), given in (|5.2[) . we show that sublinear or linear 
convergence again hold when <f> is convex or strongly convex respectively. In a series 
of numerical experiments, it is shown that an Adaptive SpaRSA, based on a relaxed 
choice of the reference function value and a cyclic BB iteration Q~5] , often yields 
much faster convergence, especially when the error tolerance is small. 
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